Extreme Classification (XC) aims to map a query to the most relevant documents from a very large document set. XC algorithms used in real-world applications typically learn this mapping from datasets curated from implicit feedback, such as user clicks. However, these datasets often suffer from missing labels. In this work, we observe that systematic missing labels lead to missing knowledge, which is critical for modelling relevance between queries and documents. We formally show that this absence of knowledge is hard to recover using existing methods such as propensity weighting and data imputation strategies that solely rely on the training dataset. While Large Language Models (LLMs) provide an attractive solution to augment the missing knowledge, leveraging them in applications with low latency requirements and large document sets is challenging. To mitigate missing knowledge at scale, we propose SKIM (Scalable Knowledge Infusion for Missing Labels), an algorithm that leverages a combination of Small Language Models (SLMs), e.g., Llama-2-7b, and abundant unstructured meta-data to effectively address the missing label problem. We show the efficacy of our method on large-scale public datasets through a combination of unbiased evaluation strategies, such as exhaustive human annotations and simulation-based evaluation benchmarks. SKIM outperforms existing methods on Recall@100 by more than 10 absolute points. Additionally, SKIM scales to proprietary query-ad retrieval datasets containing 10 million documents, outperforming baseline methods by 12% in offline evaluations and increasing ad click-yield by 1.23% in an online A/B test conducted on Bing Search. We release the code and trained models at: https://github.com/bicycleman15/skim